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Abstract 

A dynamically microprogrammable processor called MATHILDA 
is described. MATHILDA has been designed to be used as a tool in 
emulator and processor design research. It has a very general micro- 
instruction sequencing scheme, sophisticated masking and shifting 
capability, high speed local storage, a 64-bit wide main data path, a 
horizontally encoded microinstruction, and other facilities which make 
it reasonably well suited for this purpose. This paper presents an 
overview of the MATHILDA system. 



* This work is being partially supported by NATO Grant No. 755 
and the Danish Research Council Grant 51 1 -1 546. 



INTRODUCTION 



1 .0 Background and general design criteria 

In the spring of 1 971 the Department of Computer Science at the 
University of Aarhus was considering buying a standard minicomputer 
to control some peripherals and to simulate a standard medium- 
speed batch-terminal. At the same time a group was working with an 
integrated software and hardware description language called BPL [ 1 ] . 
To support this group, and, additionally, make the use- of such a 
minicomputer more flexible, it was instead decided to design and con- 
struct a microprogrammable mini within the department itself. The de- 
sign was started and went on during the summer 71 . The resulting machine, 
the RIKKE-0, was partially constructed, and started running in early 1972. 
In the meantime, a number of additional departmental projects were pro- 
posed, some of them were started while others were considered not to 
fit in with the present design. Among the latter projects were some from 
numerical analysis, the problem being the data width. RIKKE-0 has a 
short word size (16-bit) and in order to obtain an efficient implementa- 
tion of even standard arithmetic operations a wider word was needed. 

It was therefore suggested that a microprogrammed functional unit with a 
wider data path could be attached to RIKKE-0 as an l/O device, together 
with a wider memory. This organization would allow the problems of 
numerical analysis and those of the system-software to be more or less 
separated on the independent units. It was the functional unit which even- 
tually became the MATHILDA machine (a detailed description of this 
machine can be found in [2]). 

The design of MATHILDA went on during mid-1 972. It became apparent 
during the design phase that those features which were felt needed and 
those which came as side effects, covered and extended those of the pro- 
totype RIKKE-0. As it was previously decided to construct more RIKKEs 
by other funding, the MATHILDA design was adapted for the further con- 
structed RIKKEs with the (almost only) exception of the data widths being 
respectively 64-bit and 16-bit. Thus, together with a modularity and 
homogeneity in the hardware design, an economy in print-layout and con- 
struction was desired. As an example, all resources of the main data 
path are laid out on one print board, 8-bits wide. Two of the boards 
comprise one whole RIKKE bus structure with all registers, shifters, etc. 
Again, four of these RIKKE boards give the MATHILDA bus. Also the 
microprogrammed control unit for each machine is identical and is, in 
fact, also made up of several modules. 

Modularity and homogeneity were general criteria for the design. It was, 
from the very beginning, apparent that the design was going to be rather 
complex, and we had to keep the number of different features low. This 
was partwise achieved by using a standard "control-register group" for 
controlling the various resources of the system. 

Two general software ideas used in applications had a good deal of im- 
pact on the design: (1 ) the ability to run, on the higher level, a number of 
virtual machines which are to be multiprogrammed (or multiplexed) on the 
microlevel, and (2) the virtual machines are to be defined through several 



layers, e.g. in a block-structured environment. These ideas greatly 
influenced the design of the control unit, especially with respect to the 
capabilities of addressing. Many addressing facilities known at the virtual 
level, and in some cases more than those, are here visible on the micro 
level. 

Another criteria was to have a clean and consistent way of dealing with 
the timing problems. We decided not to force the speed, rather we would 
have a slower but less tricky machine. 



2. A BRIEF DESCRIPTION OF MATHILDA 

2. An overview 

One may consider MATHILDA, shown in Figure 1 , as composed by three 
layers, each of which is controlling the lower ones: 



a) 
b) 
c) 



The decoding and sequencing unit. 

Control-facilities. 

The main data path. 
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The MATHILDA Processor. 
Figure 1 . 



MATHILDA has a single 64-bit wide data path (called the Main Data 
Path, MDP) with 8 inputs consisting of various 64-bit wide registers, 
shifters, masks, and so on. It also has a number of control-information 
data paths and storage elements dedicated to those places where control 
of system resources can be exercised. The MDP can itself act as a data 
transformational element. All information transported over the MDP is 
subject to the following three transformations, in this order: 

1 ) Masking by a selected, stored mask which has been specified 
by the user. 

2) Shifting by a specified number of bit-positions via a right 
cyclic barrel shifter. 

3) Masking by either an end-off mask of specified amount 

from the right or left, or by a selected stored mask as in (1 ). 

Furthermore, during data transport, two functional units are operating 
on the transported information (as present between step 1 and 2 above), 
and without effecting the information, providing the system with control 
information (bus parity, test if the BUS = 0, and bit-encoding, see 2.3.3). 

The amount of parallelism in data transfers which MATHILDA allows for 
is restricted, in some sense, on the MDP. The parallelism does not con- 
cern the moving of actual data, but controlling the source, data path, 
destination of the transported data, and the control of functional units 
operating on the actual data. 

The control-facilities consist of storage-elements, data paths and func- 
tional units which allows for a high degree of parallelism and permits a 
variety of external, stored or computed control upon the units on the main 
data bus. The possible sources of control-information are the following: 

1 ) CM: the current microinstruction 

2) EX: an external register (from controlling processor) 

3) BE: the bit-encoder unit (see 2. 3. 3) 

4) SB: rightmost bits of the shifted bus 

5) SG: registers for residual or saved control 

It should be noted that item (1 ) above allows the user "immediate control 11 
over the system !, s resources whereas items (2-5) offer varying degrees of 
"residual control" in the context of Flynn and Rosin [3]. Furthermore, 
among the control facilities are system counters, local storage groups, 
and condition save register. There are also "status" and "snooper" fa- 
cilities, which can be used to gather data useful in computing statistics con- 
cerning the system and thus enhance the experimental nature of this ma- 
chine. 

A microinstruction is 64-bits wide. Particular fields of a microinstruction 
are highly encoded while others are minimally encoded. The minimally en- 
coded fields (from 1 to 3 bits) control particular highly used system re- 
sources and the instruction sequencing. The highly encoded fields (typ. 8- 
bits) are mostly used for exercising the control facilities associated with 
the arithmetic logical unit, the residual control standard groups, addressing 
of local storage elements, and so on. Up to 4 highly encoded microopera- 
tions can be specified in one microinstruction while at least 7 minimally 
encoded operations can occur. 



2. 1 The Main Data Path t MDP 

The MDP forms the base of the system, and we will begin a more detailed 
discussion of the system with it, the lowest level, where the actual data 
handling takes place, controlled by the upper levels. The MDP is shown 
in Figure 2. 
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In every microinstruction it is possible to take the contents of a specified 
64-bit wide source and mask it by use of a mask, BM, which has been 
specified by the user. The masked data is buffered in a latch, the output 
of which is termed the BUS. 



The data on the BUS is continuously being encoded, yielding various types 
of control-information (parity and bit-encoding, see 2 . 3. 3). Furthermore, 
the BUS information may directly be used for loading into various special 
destinations, e.g. output buffers. The data on the BUS is passed through 
the bus shifter, BS, wh'Hch when enabled shifts the data n bit position 
right cyclic; where 1 < n< 63. The physical shifter is constructed in a 



manner quite similar to the barrel shifter of the CDC 6400 series ma- 
chines. After shifting, the information is masked again, this time by a 
combined mask PM V PG, PM being similar to BM, and PG as an end-off 
mask generator, which allows the Bus-shifter combined with PG to result 
in a logical shift. The information after shifting and postshift-masking is 
buffered in a latch, called the SB Latch. The output of the SB Latch is 
called the Shifted Bus, SB, and is finally loaded into selected destinations. 

The Bus Mask, BM 

The Bus Mask, BM, is composed from two independently stored masks: 
BM = MA V MB. Both MA and MB are read from a store each containing 
16 such masks. The stores have their own address register, as shown in 
Figure 3. 
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The reason for the inclusion of double masks is that one group of 
masks (say MB) containing a no-mask and an all-mask can be used to 
enable/disable the other group of masks (say MA). Data is loaded into 
MA and MB from the SB. 



The Bus Shifter, BS 

The implementation of the BS is 64 parallel selectors: one selector for 
every position in the output which selects which of the 64 input lines Of 
the BUS is to be used (in a right cyclic connection) as the corresponding 
output bit. When no shift is required, the selectors all reside in a standard 
no shift position by disabling the selector via a dedicated bit in each 
microinstruction. When the BS is enabled, a BS Source Selector, BSS, 
see Figure 4, determines the source which is to be used as the shift 
specification. 
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The Postshift Masks j PM and PG. 

The output of the BS is again masked by the masks PM V PG, see 
Figure 5. Here PM s PA V PB is identical to BM with the single exception 
that PA and PB are loaded from the BUS whereas MA and MB are loaded 
from the SB. The F^G is a PROM whose contents can be combined to yield 
the 1 28 masks which are required to make the BS appear as a logical 
left/right shifter as well as a cyclic left/right shifter. The enabling of the 
PG is determined by the PM, i.e. PM contains both a no-mask and an all- 
mask, thus allowing it to be thought of as a switch for the operation of PG. 
The control of the PG, i.e., the specification of which mask to apply is 
similar to the specification of the shift amount for the BS, i.e. , a selector 
PGS, as shown in Figure 5, determines the source. 
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If the same source of control is used for BS and PG, the six low order 
bits will specify an amount n of right cyclic shift in BS, and the seventh 
bit will specify whether a logical rightshift of n places, or a logical 
leftshift of 64-n bit positions will be the result when F>G is enabled 
(i.e. , PM = PA V PB = no mask). 

2.2 Additional MDP resources 

The other MDP resources, shown in Figure 2, will now be briefly dis- 
cussed. These resources are either possible sources of a bus-transport, 
or destinations for a transport, or both. Except for the arithmetic-logic 
unit, ALU, they are all some sort of registers, some are pure storage 
elements (WA, WB, LR), some are shifters (AS, VS, DS), and the rest 
for I/O communication (IA, IB, OA, OB, OC, OD). 

Working registers (WA and WB) and their Loading Masks (LA and LB) 
The system contains two local storages, WA and WB both containing 
256 words. They are composed of 80 nsec, 256 x 1 bit chips. As they 
are identical we will only consider one of them, say WA, which is shown 
in Figure 6. Addressing of WA is made by a 8-bit pointer, WAP, which 
determines which location to read or write. WAP is, in fact, composed 
of two 4-bit pointers which may be coupled together (or be decoupled), 
which allows for considering WA as either 256 elements, or as 1 6 groups 
of 1 6 registers. 
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The writing of data into WA is also masked by means of a loading mask, 
LA. LA makes it possible to load only selected fields of the addressed 
WA-word, without affecting the remaining word. This permits the con- 
struction of say the result of a floating point operation, by loading the 
fields of the packed representation separately as they are being computed. 
LA (and LB on WB) is again a group of 1 6 masks with its own address- 
mechanism. (In fact LA is identical to a register group as shown in 
Figure 10.) 
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The Arithmetic and Logical unit, ALU 

This is a standard 64-bit wide ALU, see Figure 7, operating within 
the cycle time of the machine. It is implemented in 3 level carry look 
ahead logic. It can give 16 arithmetic and 16 logical operations on two 
operands. The operation to be performed and the carry-input is specified 
by the contents of a function register, ALF, which can be preset to cer- 
tain standard functions or loaded from a standard group. 
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The inputs to the ALU, i. e. LR and AS, are discussed below. 

The Local Registers, LR 

One of the inputs to the ALU is the output of a 4 element register group, 
the Local Registers, LR, see Figure 8. LR has independent address 
mechanisms for reading and writing. The 2-bit read-address LROP 
(LR-output pointer) can be specified (loaded) in such a way that it allows 
for a fast small table-look-up. The input address pointer, LRIP, is 
loaded similarly and both addresses can be incremented and decremented 
(wrap-around). The DS (V:V+1 ) input to LRIP is discussed below. LR 
has no direct access to the bus, but its contents must be gated through 
the ALU. 
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The Accumulator Shifter, AS 

The other input to ALU is the Accumulator Shifter, AS, see Figure 9. 
The AS is a register which can shift one position left or right in one 
operation. Any 1 of 8 possible specified sources can be loaded into the 
vacated bit position. One of these inputs may be an arbitrary selected 
bit of AS itself, thus allowing AS to act as a cyclic shifter of an arbitrary 
length n, 1 < n < 64. AS has no direct path to the bus but must gate its 
contents through the ALU. 
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The Variable Width Shifter, VS 

The variable width shifter, VS, is logically identical to the AS, but 
has direct output to the bus. Together with the AS, it can form a cyclic 
shifter of arbitrary length n, 1 < n < 1 28. 

The Double Shifter, PS 

The Double Shifter, DS, is also logically identical to AS, except for 
the fact that it shifts two positions at a time. The two variably addressed 
bit positions (besides from being used as input to vacated bits in shifts) 
may be used to load LRIP and LROP as shown in Figure 8. Thus the 64-bit 
content of DS may be used in 32 steps to control algorithms two bits in 
one step (e.g. in a multiplication algorithm) by table look-up in LR. The 
DS, AS, and VS can be interconnected so that two 32-bit words can be 
merged (interlaced) together and so that one 64-bit word can be decom- 
posed in the reverse fashion. 

The Status Port, SP 

This port provides a path between various control and microinstruction 

sequencing facilities and the MDP. Its functioning is explained in section 

2.3.5. 



The Input Ports IA and IB 

Each of the ports can potentially access 1 6 different input devices, 

addressed by associated device address-registers IAD and IAB. 
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The Output Ports OA and OB 

Similarly to the input ports, OA and OB can access 1 6 devices, al- 
though ports may be dedicated to a single device in the case that a 
device is expected to be very heavily used. As an example a standard 
memory will be considered as such a device, where an input port and 
an output port may be dedicated. 

The Output Ports, OC and OP 

These are identical to OA and OB, except for the fact that they are 
loaded from the BUS and not as normal destinations of the SB, but 
loading is specified by microoperations. 

2. 3 Control facilities 

As it can be seen from the previous description of the MDP and its 
associated storage and functional elements, most of these resources 
require some sort of control-information. It may be an address (e.g. 
for WA and \AO), data for a functional unit (e.g. a shift specification 
for BS or mask specification for PG), a functional specification (e.g. 
control of ALU function) or a selector-specification (e.g. for vacated 
bit input to a shifter). 

2.3.1 The standard groups 

As one of the main objectives of the design was to offer a host upon 
which a wide variety of experiments could be conducted, it was decided 
to delay the binding of as many design decisions as possible and to give 
the user capability to bind some decisions. With respect to the specifi- 
cation of control information for all functional units, the user is able to 
choose between immediate control and some form of residual control. 

Each resource of the system can be controlled from a field within any 
particular microinstruction, thus allowing the immediate control source. 
Another such control source might be some storage element that could 
have been set up by a particular microinstruction to be used by a later 
microinstruction, i.e., residual control. Possible residual control 
sources might be from the external world (e.g. another computer), the 
data carried on the bus, or a result of some computation thereon. In MA- 
THILDA, control information may be selected among four possible sour- 
ces, among which CM (the current microinstruction), EX (external re- 
gister), and the output of a RG (a register group) form three inputs to a 
control selector. The fourth input is either SB (least significant bits of 
the shifted bus) or the output of BE (the bit-encoder, see section 2.3.3), 
the choice of which is bound by the designers depending on the needs of the 
unit in question. 

The selection of the control source is normally specified in the very same 
microinstruction which specifies the use of a particular resource. The 
control information normally is buffered (e.g. as in the ALU-function 
buffer, ALF, a working register address pointer, or in a resource itself 
as in the case with system counters). Only where the control information 
is directly affecting the bus transport, as with BS and f^G 9 was it ne- 
cessary to bind the selection of source prior to the use of the information. 
This is obviously because the selected information is not being buffered, 
but directly affects the bus transport when the unit is enabled. 
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The amount of residual control facilities is large and could have been 
rather complex. However, a great deal of simplicity and modularity has 
been achieved both logically and physically by use of a uniform residual 
control concept, the standard group and its selector, see Figure 1 0. 
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A Standard Group is a storage element of 16 words with an address 
mechanism. The local storage associated with a standard group is 
called a Register Group, RG, and is of an appropriate width depending 
on which unit the standard group is associated with. The storage is used 
for residual control, and in these cases where the information residing 
there is part of a fixed environment, the loading of the storage takes place 
from the SB. However, in connection with some units, it was more na- 
tural to load the storage from the unit itself, so that the information there 
could be saved and later restored. Loading a standard group from the SB 
then becomes a two-step proces: first load the unit from SB, and then 
"save it ,f . 



The reading of and writing into the storage of a register group is con- 
trolled by an address or pointer contained in a register, the RGP which 
can be loaded, cleared, incremented or decremented. Loading of a RGP 
can also take place from four sources, two of which are local to the re- 
gister group, the Save-1 and Save-2 registers. The timing is so that it 
is possible in one microoperation to save the contents of a RGP in its 
Save-2 (S2)-register, and load new data into RGP (which might be the 
contents of its SI -register). 

2. 3. 2 System counters CA and CB 

Among the control-information units are also two 16-bit system counters 
CA and CB, which can be used in controlling algorithms, e.g. for counting 
in loops, etc. , see Figure 1 1 . It should be noted in Figure 1 1 that the 
content of a counter can be saved and later restored, so that one counter 
can control up to 1 6 levels of embedded loops. 
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The counter itself can be incremented, decremented and cleared, and the 
actual content of the counter can be tested towards zero, and some of the 
bits can be tested individually. 

The only difference between CA and CB is that where the sources for 
loading CA are: CM, EX, SB, and CA-save-group, the similar sources 
for CB are CM, EX, BE, and CB-save-group (BE is the output of the 
bit-encoder, described below). 

2. 3. 3 The Bit-encoder BE 



One resource of the MATHILDA is to our knowledge a new invention. It 
is what we call the Bit-encoder, BE. In many algorithms it would be use- 
ful to have easy access to information about a given bit-pattern as to 
where is the first bit on, and where is the last: 
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We call the quantities I and m respectively LSB (least significant 
bit) and MSB (most significant bit). The bit-encoder can provide the 
user with such numbers, and certain computations using LSB and MSB. 
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The Bit-encoder is continuously encoding the information on the BUS, 
yielding the two quantities I and m corresponding to the transported 
bit-pattern. By a microoperation these can be loaded into two registers 
LSBi^ and MSBx . The computational network of the Bit-encoder further 
contains two additional registers LSB 2 and MSBg , which can be loaded 
from LSB-l and MSB X respectively or be interchanged with these. As- 
suming LSBg and MSB 2 contain the encodings from a previous BUS- 
transport, the circuitry of the Bit-encoder continuously computes the 
following quantities: 
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The output of the BE may be used in various control elements of the 
system. Furthermore the Bit-encoder always provides the following 
testable conditions: 
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Based on these conditions, it is possible to compare some of the charac- 
teristics of the two bit-patterns whose encodings have been loaded to: 
a) direct decisions about the algorithm, b) choose BE-function, or c) 
interchange (LSB X , MSB X ) and (LSB 2 , MSB 2 ) before selecting the 
BE-function and use the selected information. 

Since bit-patterns and bit-matrices play an intensive role in many non- 
numerical algorithms, fundamental operations providing the encodings 
I and m, may prove to be useful on the virtual machine- 1 eve I, and also 
high-level languages. 

2. 3. 4 Condition save registers, CR, and switches, KA,KB,KC and KD 
A number of system conditions (almost 1 28) are continuously sampled and 
in every instruction any one of these may be selected for testing. The 
main use of such a selected condition is of course in the sequencing which 
will be described later (s. 2. 4. 4), but it may be useful to be able to save 
a condition as it arises to be used in a later sequencing decision. The 
Condition save registers (CR) are a 1 -bit wide (save register) group where 
the value of a condition may be saved and later retrieved, see Figure 1 3. 
Switches exist in two variants: KA and KB are console-switches, KC and 
KD are programmable switches that furthermore also can be loaded with the 
selected condition. 
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2. 3. 5 The Status Facility 

The Status Facility establishes a data path between various control re- 
gisters, address pointers, functional units (e.g. counters, and the bit- 
encoder) and the MDP. The Status Facility provides a basis for gathering 
both data concerning the operation of the system and data for use in al- 
gorithmic processes. 
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64 different sources of data are fed into a status selector. Which partic- 
ular source is to be gated onto the MDP as the bus-input is determined 
by the content of the Status Port Pointer (SPP) register. It should be 
noted here that st is a 16-bit field from a microinstruction so that con- 
stants can be put on the MDP with relative ease. 



2. 3. 6 The Snooper Facility 

The Snooper .Facility consists of a) a Snooper Control Store and b) 
Snooper Resources (e. g. 2 groups of 1 6-registers, counters, and com- 
parators). The Snooper unit works in the following way: when the address 
of the next microinstruction to be executed is sent to the MATHILDA 
Control Store address buffer, it is also gated into the Snooper Control 
Store address buffer; at the same time the microinstruction is fetched 
so that it can be executed, the contents of its associated Snooper Control 
Store location is fetched; in parallel with the microinstruction being 
executed, the contents of its associated Snooper Control Store just fetched 
is used to control the operation of the Snooper Resources. Snooper Con- 
trol Store (80 nanosecond storage) is 16-bit wide and has the same 
number of words as the MATHILDA Control Store. A snooper word can 
specify for example, any two registers which can be counted up (or 
down). Vhe Snooper Facilities can be written or read through the normal 
input/output ports of the system in much the same way as the Status Fa- 
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cilities. Snooper Control Store is writable so that different data gathering 
routines can be associated with the same segment of microcode without 
changing the microcode. The user is allowed to establish the correspondence 
between any particular snooper resource and the routine upon which it is 
snooping. 

2. 4 Microinstruction sequencing and execution 

One microinstruction execution of the machine may be considered as con- 
sisting of four major sequentially executed steps: 

A: Microinstruction fetch 

B: Data transport on MDP 

C: Execution of microoperations 

D: Address calculation (for the next microinstruction) 

Steps B, C, and D are controlled by fields within the 64 bit wide micro- 
instruction. Logically, to the user, these steps and substeps within these 
may be considered sequentially within the microinstruction execution, 
although of course many of the activities take place in parallel in the 
physical implementation. The execution of one microinstruction is to be 
considered as totally completed before the next microinstruction is exe- 
cuted, i.e., actions initiated by the execution of a particular microopera- 
tion do not span several microinstruction executions. 

The microinstruction consists of three major fields, corresponding to the 
control of the previously mentioned steps B, C, and D: 
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The dealing of dynamic conditions has been made consistent, 
can be run in two modes, under program control: 



The machine 



Long cycle: 



Any condition arising as an effect of the execution of 
steps A, B, C can be tested and used for sequencing 
in step D. 



Short cycle: 



All conditions used in the instruction is of the state of 
the machine immediately prior to step A. 



2.4.1 Microinstruction fetch (step A) 

The control store may consist of up to 4096 words of 64-bits (80 nS). 
Initially, 51 2 words of control store has been implemented. It is writeable 
under program-control from an output port of the system itself, or from a 
16-word deadstart PROM. The control store is addressed by the contents 
of an address buffer, which is normally determined by the sequencing 
part of the previously executed microinstruction. [However, after either 
deadstart or an interrupt, the control store address buffer is forced to 
contain a zero. ] 
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2. 4. 2 MDP-transport (step B) 

The specification of the MDP transport is given in this field. The source 

of the transport, which is the input to the Bus Selector and latch (see 

Figure 2), is specified by 3-bits as is the destination of the SB. 1 -bit 

is used to enable the BS and, if necessary, data is supplied from the 

C-field. 



The sub-steps of step B are: 



B 3 
B 4 

B 5 
B 6 

B 7 



Selection of^source 

Masking by 'MB/ 

Buffering in the BUS-latch 

Shifting by BS if enabled 

Masking by PM V PG 

Buffering in the SB-latch 

Loading into a selected MDP destination 



2.4.3 Execution of microoperations (step C) 
The C-field is divided as follows: 



Data and mops 


Shifter control 


63 29 


28 23 



The n shifter control 11 field contains three 2-bit fields associated with 
AS, VS, and DS. Each 2-bit fields specifies: shift left, shift right, 
load or H do nothing 1 ', for each shifter respectively. The. "data and mops 11 
field is again subdivided into several fields as follows: 



F1 


SI 


M2| F2 


M3 


F3 


S3 


M4 


F4 



Fields F1 , F2, F3, and F4 are each 7 bits wide, fields S1 and S2 are 
2 bits wide, and fields M2, M3, and M4 are 1 -bit fields. The mode-bits 
M2, M3, and M4 determines whether F2, F3, and F4 are to be decoded as 
microoperations ("mops 11 ), or to be used as data. F1 is al- 
ways decoded as a "mop 11 , so up to four "mops 11 in addition to the AS, VS, 
and DS control may be specified in one instruction. Many places in the 
system may be supplied with data from the fields F2, F3, and F4. Most 
often the data is supplied through a selector, which also needs control. 
Furthermore, the loading of the data into its destination is specified by a 
"mop 11 . The standard f, set-up M for such a load is to specify the load-mop 
in say F1 , the selector-setting in SI , and if the setting requires data 
from CM, this is taken from F2, i.e. , M2 is to be set to disable the de- 
coding of F2 as a mop. Similarly, F3, S3, M4, and F4 may be used together 
to control some resources. 



* ) The MDP destinations AS, VS, and DS have their own separate 
load/shift control bits in the C field. 
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Data requirements are normally 4-bit or 6-bit wide and are usually 
taken from F2 and F4. PG requires 7-bits (F2), and CA and CB need 
16 bits which are supplied from the fields F2, F3, and S3 concatenated, 
acting as an "expanded F2 n field* F3 is used for BS shift-specification, 
when the BSS specifies CM as the data source, and BS is being enabled. 
Furthermore data from F3 and F4 may be used in sequencing (step D) for 
addressing computation. 

When F1 , F2, F3, or F4 are being decoded as mops, they can specify a 
variety of actions around the whole system. The decoding allows for 51 2 
different interpretations, some are only dupl ications of the same action, 
so that some space-conflicts among mops and data can be avoided. 

Each mop has a clock-specification, dividing the set of mops into two 
classes, clock 1 -mops and clock 2-mops. The clocks are such that the 
execution of a clock 1 -mop is completely finished before clock 2-mops are 
being initiated. 

The execution of operations in step C can be thouc^it of as being performed 
in the following substeps: 

CI : Gate the information from S-fields and from F 2, 

F3, F4 fields to their destinations irrespective of 
their expected or non-expected use. 

C2: Decode the enabled F-fields depending on the 
enabling by the respective M fields. 

C3: Execute (clock) the specified clock 1 -mops 

C4: Execute the specified load/shift actions in AS, 
VS, and DS. 

C5: Execute (clock) the specified clock 2-mops 

The class division of mops with respect to clocking, allows as an example 
for the loading of data into a particular RG, and then changing that par- 
ticular RG's address pointer within the same microinstruction. 

2. 4. 4 Address-calculation (step D) 

The determination of the address of the next microinstruction to be executed 

is an implementation of the well known if -then-else construction. The 

selection is among two modes of address-calculations (rather than addresses 

themselves). Possible modes of address-calculations (as well as the way 

the if-then-else clause is realized by the conditional use of the A or the 

A £ . ) are illustrated in Figure 1 5. 
false ^ 

The D-field of the microinstruction consists of the following subfields: 



BISB 


C1SB 


Condition 
Selection 


Afalse 
selection 


^true 
selection 
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where A and A f . are 3-bits, Condition selection is 7-bits, BISB 

is 2-bitsY* and CISB a ii e a 1 -bit field. The particular condition which is 
selected by the 7 CISB bits is denoted by "c 11 in Figure 1 5. 
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Addressing thus allows straightforward transfer of control to neighbour 
instructions (A-1 , and A+1 , where A is the address of the current in- 
struction). The arithemtical-logical unit, CUAL, provides an address 
which is computed from A and B (address-constants as described below); 
such computed addresses provide, for example, relative and absolute 
addressing. Furthermore the current address may be pushed onto 1 of 2 
return jump stacks (RA and RB) each of which is 1 6 levels deep. The top 
of either stack may be used in later steps through adders. The RA and RB 
stacks are automatically popped when they are being selected. Finally 
the Save address-buffer (SA) provides a path between the Shifted Bus 
and the address-bus, and the external buffer (EX) provides a path from an 
external source. A force-zero possibility exists, it is used on deadstart 
and for "hard 11 interrupts. 
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The B-input, which can be used in various address computations, is 
specified to be one of the following four sources, determined by the 
BISB-field: 

O: A constant zero 

t 1 : Sign-extended version of t (t:6 bits from F4 of 
C-field) 
To t: T concatenated with t (T:6 bits from F3 of C-field) 
OoSA(0:5): The zero concatenated with the least significant 
part of the save-address buffer* 

The CiSB-bit is used in specification of the carry-input to the address 
(-H ) returns without need for non-zero B-input, and jumps to one of two 
consecutive locations depending on a condition. 

The execution of step D consists of the following logical substeps: 

D 1 : Choose the selected condition, c. [in short cycle 
c is the value of the selected condition prior to 
step A, in long cycle the new value.] 

D*. Select the carry-in and B-input into the return jump 
stack adders and the CUAL. 

D~: Compute the results of the return jump stack addition 
and the CUAL function. 

D , : Select the new address, using A if c = 1 or A f if 
c = as the bus-selection, and load the address- 
buffers. 

D-: If RA and RB has been selected then pop the stack 
that was used. 

D fi : If a force-zero situation has occurred then load the 
interrupt recovery address, and clear the address- 
buffers. 
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3. CONCLUSIONS AND EXPERIENCE 



3.1 Design and hardware considerations 

The design of MATHILDA as an experimental tool for research in emulation 
was partially initiated based on design experience gained earlier by 
another group in the department on the construction of the RlKKE-0 ma- 
chine. It was felt at the time when the MATHILDA design process was 
started, that no commercially marketed processor was available within 
the funding we could afford (or apply for), when justified as an initially 
pure research processor. 

As a department in a non-engineering university, our hardware-staff was 
very restricted (2 engineers + 1 technical assistant). This staff was main- 
ly intended for minor construction and interfacings, together with service 
on all standard equipment in the department. The design was originally in- 
tended to be for a functional unit, very intimately connected to the con- 
trolling RIKKE-O. However, during the design process a number of complex 
facilities were added. Thus the functional unit grew into a selfcontained 
processor and, of course, our original time schedule for the construction 
was not satisfied. Besides the growth in design, unexpected other duties 
of the technicians, delivery problems, lack of project management experience, 
and the fact that the project was, in retrospect, a bit overambitious for the 
staff, delayed the project. It was, however, decided to test the design on 
the 1 6-bit RIKKE version before the construction of the 64-bit MATHILDA. 

The basic design, including the bus structure and the concepts of standard 
groups bit encoder etc. ) was made by the authors in the period from 
February to August, 1 972. The sequencing part and instruction format was 
designed in the fall, while the printboards for the bus-structure were laid 
out and actual mounting of RIKKE started. By August, 1973, the control 
unit and control facilities (register groups) were ready for initial hardware 
testing. In late October, the 16-bit bus structure was added for testing, and 
in January 1 974 initial testing of the RIKKE processor was considered com- 
pleted. However no main memory was added before March due again to 
delivery problems. At the time of this writing (June 1 974), the RIKKE 
has interfaced to it a high speed paper tape reader and a small mini-printer. 
The RIKKE machine is complete as the MATHILDA machine described in this 
document except that the following facilities are not yet implemented: 
Snoopers, Status, Bit-Encoder, and the Force-0 address and IRA facility. 

The construction of MATHILDA was started in August, 1 973, but since 
testing of the MDP (and the print boards) was not done before January 1 974, 
the n go ahead' 1 for the construction of the 64-bit bus structure could not be 
given before then. Here again delivery problems for the print boards 
caused a further two-month delay. Currently, the mounting of the bus struc- 
ture is halfway through. 

The cost of the construction is* not easily calculated. An estimated cost of 
$ 1 2,000 for components and $ 1 0,000 for mounting assistance was granted 
by the Danish Research Council. The support from the staff-technicians 
cannot be computed that simply because of their other duties and projects. 
An estimate of 2 man-years of design and documentation from the technicians 
might be adequate. The experience gained in the department during this pro- 
cess cannot be underestimated, we learned a lot about what to do, and 
especially what not to do in such a project. 
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3. 2 Experience with programming 

Since the summer 73 as assembler, MARIA [4], and a simulator of the 
system have been in use. Four simulators for RIKKE [5] and MATHILDA 
[6,7] have been written in addition to emulators for an O-code machine 
for BCPL [ 8 ] and a P-code machine for Pascal [9]. Furthermore, basic 
software (bootstrap loader, normal izer, etc. ) and a large number of routines 
for the implementation of arithmetics has been implemented. Experience 
with the programming of the processors shows that the design seems to be 
suitable for experimental purposes, although coding is not so straight- 
forward because of the horizontal nature of the microinstructions as on 
processors with more highly encoded (vertical) but more primitive instruc- 
tion formats (e.g. the B1 700). Certain facilities of the system has proven to 
be extremely useful, the easy and natural sequencing possibi I ities, the BS, 
BM, PG, and especially the BE. 

Although the design of MATHILDA may seem somewhat complicated, ex- 
perience from a course on computer architecture given at the University 
of Southwestern Louisiana indicates that the students learned the MATHIL- 
DA design with reasonable ease. We are at present only in the very be- 
ginning of the real use of the RIKKE and MATHILDA processors, and 
only in the future can real evaluations of the suitability be made. 
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Appendix A. 

GLOBAL SYSTEM DESIGN, 

A. 1 The MATHILDA-RIKKE-Wide Memory system. 

Although the organization of the MATHILDA processor is the key issue 
of this report, and the use of it is in no essential way constrained to 
the specific environment it is intended to operate within, it seems rea- 
sonable to explain the actual proposed system. 

The fact that MATHILDA is to be treated by RIKKE, as an I/O device, 
as shown in Figure 16, offers a great flexibility and proposed some 
interesting projects. A particular problem associated with such an 
interconnection was whether synchronisation was necessary or not. It 
was decided that, except for a few control signals, every information 
interchange should be queued up so that asynchronous operation was 
possible. The same principle was also used with respect to a 64-bit 
Wide Memory, this being a common resource to the system, and not 
dedicated to MATHILDA. 

In this way the system was designed to be three independent units ope- 
rating asynchronously and communicating through queues: 
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The Wide Memory is exclusively controlled by RIKKE, i. e. it is the 
only unit that can deliver requests for memory access. The data trans- 
fer can take place on a number of memory ports. RIKKE itself will be 
attached to a set of l/O-ports of the memory system. RIKKE also has 
its own private memory (up to 64K 16-bit words). Standard l/O-devices 
are therefore intended to be coupled either directly to RIKKE or toother 
minicomputers communicating with it. 

The idea of operation is to let RIKKE decode virtual machine instructions, 
and to do all address calculations involved, while MATHILDA was to 
Perform the actual data transformation required. The philosophy of the 
design of MATHILDA was to give it capabilities for doing transforma- 
tions upon a wide data word. Complicated "macro 11 routines could be 
implemented in MATHILDA microcode and thereby define a !, Nano-ma- 
chine 11 which can be called upon from the microcode in RIKKE to define 
the complete virtual machine. In fact it is not a nano-machine in the QM-1 
sense [ 1 0], but it may be operated in such a way that a similar effect 
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is achieved. Among the differences in these approaches is that RIK- 
KE, running in parallel with MATHILDA, normally will be ahead of 
it, preparing instructions and operands for it. 

A. 2 Projects related to the MATHILDA-RIKKE-Wide Memory system. 

The cooperation of RIKKE-MATHILDA-Wide Memory in a synchronous 
or asynchronous fashion seems to offer fundamental questions similar 
to those posed in other recent types of computer architecture, like 
the Illiac-IV or the CDC-Star computer systems. We are undertaking 
a wide range of projects both at the University of Aarhus in Denmark 
and the University of Southwestern Louisiana in Lafayette, Louisiana 
which are related to this system. 

A. 2. 1 Projects within the computer systems and language area. 
A major topic in the system area is the construction of distributed sys- 
tems architecture, that is, for example to evaluate operating systems 
and compilers for computer complexes with more than one processor. 
Notice here that the present system differs in a fundamental way from 
standard multi-unit, MATHILDA is a resource to a RIKKE, or a system 
of RIKKE 1 s. 

It is, of course, not at all obvious what the architecture of the best ma- 
chine should be in order to assist in the efficient construction of various 
complex systems and architectures. We intend to investigate the possi- 
bility of formalizing the capability of various host architectures, to i- 
dentify what primitives are needed and can be implemented in microcode 
to support compilers, high level languages and operating systems. Thus 
MATHILDA is a tool in such research and not the answer to the ques- 
tions which are raised. 

An immediate project is the direct emulation of different virtual machines, 
either standard computer systems or given experimental machines e.g. 
the BCPL language running on emulated virtual O-code machine [ 8 ]. 
Such an emulator has been implemented on the standalone RIKKE, this 
allowing a simple transfer of already accessible software in BCPL to 
be available on the system, (e.g. [ 11 ]). 

The RIKKE-MATHILDA-Wide Memory system offers a complex tool for 
handling of composite data structures. Most computer systems today o- 
perate within a multilevel store environment. A basic question within 
the field of data structures concerns how the distribution of different 
primitives in a program to the appropriate kind of store and processors 
is to be made. As an example, the handling of list structures may be 
performed in RIKKE, while MATHILDA is doing the computations at the 
nodes. Also in the support of special peripheral devices on RIKKE, e. g. 
a graphical display or an electrostatic plotter, there is a demand for 
high speed access to large and complex data structures. 

A core project in the systems area is to define and implement a micro- 
monitor, resident in RIKKE control store, to allocate the resources of 
the system, I/O as well as processors, and which will allow multipro- 
gramming among emulators, [ 12]. 
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Another project concerns the implementation of the language Pascal 
[l3 J to support the immediate applications. The basic tool in this 
and other language implementations is intended to be a compiler- gene- 
rator using LR(k) techniques (the BOBS - system[ 14]), which has al- 
ready been implemented at Aarhus, and being transferred to the RIK- 
KE-MATH1LDA system. 

The target system for the Pascal-compiler is a stack-machine (called 
P-code machine [ 9]), running in such a way that evaluation stacks 
exists in RIKKE as well as in MATHILDA. 

The implementation of Pascal is considered as the first step in an ex- 
periment with developing a language for numerical algorithms in non- 
standard arithmetic. The goal of this project is to define a language 
where several real-datatypes can co-exist and be controlled. This 
will allow the experimentation with various "virtual arithmetic units 11 , 
[15], for non-standard arithmetic to be performed on MATHILDA, con- 
trolled by RIKKE. There are also projects being undertaken 
which deal with an integrated approach to fault tolerant virtual systems 
and their performance measurement. 

A. 2. 2 Projects in numerical analysis. 

We are currently undertaking projects where MATHILDA could be ap- 
plied in making non-standard arithmetic available. An outline of such 
projects may be found in [16], Many subroutines in both high-or low- 
level languages have been written to allow the use of extended range, 
extended precision, significant digit, unnormalized interval, rational 
and complex arithmetic. Special purpose hardware is much too expens- 
ive, but the general structure and microprogrammability of MATHILDA 
certainly will offer more efficient implementation of the arithmetic pri- 
mitives. The overall structure of the system will allow extensive ex- 
periments on various arithmetics, by changing underlying structure. 

A key question in the study and implementation of machine arithmetics 
as on higher levels will be to extract the fundamental "core 11 of the 
operations, i. e. to determine a fundamental set of operations on diffe- 
rent bases in such a way that a structured (layered) development of any 
particular arithmetic will follow. 

This approach will be combined with the M top-down ,f analysis from the 
application and language point of view. The idea being to move step-wise 
the border between the implemented virtual machine towards the higher 
level machine. The projects within the systems area will fit those in 
the numerical analysis part, and provide fundamental software needed 
for these projects. 

As one of the major problems in the application of digital computers in 
numerical analysis, is representation of numbers, and the operations 
upon them, the most immediate project will be to conduct theoretical 
and experimental studies in machine arithmetic. 
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Appendix B: 

Physical summary of MATHILDA. 

MATHILDA is mounted on 7 frames 8 X 45 X 60 cm, each of which is 
turnable around a vertical axis like pages in a book. Each frame is 
closed from both sides with removable boards of plexiglas, thusfor- 
ming a closed box. Each "box" is equipped with three small ventila- 
tors blowing a stream of air up through the frame. Signal-interconnec- 
tions between the frames are through standard cables containing 20 sig- 
nal-ground wound pairs of wires. Plugs are mounted along the vertical 
sides of the frames. 

The printboards are two-sided and are either special prints or standard 
prints only containing power and ground where signal-interconnections 
are in the wiring. Special purpose prints exists only in three variants: 

a) 8-bit wide of the whole MDP, 25 X 40 cm, 

b) 4 X (4 bit of a standard register group), 15 X 40 cm, 

c) 256 words of control-store, 64-bit wide, 15 X 40 cm. 

All of the special purpose printboards furthermore contain some room 
for additional circuitry. 

No attempt has been made to carry signals to the edges of the boards for 
board-to-board and board-to-plug interconnections, all such connections 
are made with wires from the proper places on the prints. 

All circuits are from the Tl 74 series or equivalent. A large amount of 
signals (data buffers) have been made visible on the boards by light- 
emitting diodes (for diagnosis and step-mode). 

A survey of the content on the frames are given below: 
Frame 0: (Sequencing and Control Store) 

1 pc 30 x 40 standard print containing sequencing, clock- 
generators and deadstart. Masterclock (40 nS steps) and 
its input into a shift register which pulses various clocks. 
At present there are 9 steps in one (short) cycle, but 7 steps 
should be the ultimate (short) cycle. 

2 pc type c (above) prints control store, 1 28 pc TI 74200 .+ ■ 
amplifiers. 

Frame 1: (CA, CB, WAP, WBP, LA, LB) 

4pc type b prints (standard-groups). 
Frame 2: (Standard groups for shifters, PG, BS, BM, PM, AL, etc. ) 

3pc type b prints + 1 pc 15 X 40 standard print. 
Frame 3: (End-connection for bus modules, BE + various). 

1 pc type b print + standard print. 

Frame 4-7:(Each contains 16 bits of MDP with all registers and ALU) 

2 pc type a prints + 1 pc 10 X 40 standard print on each frame. 

Power supply: 8 pc, 5V, max 15 amp. 

Console: (preliminary) 

Buttons: Deadstart, Run, Stop after deadstart, Step-Stop 
(two pushes pr. cycle, first instruction fetch, second exe- 
cution). 

Switches: KA, KB conditions, 2 stop-switches ( stop on exe- 
cution of specific mops, i. e. not as testable condition). 

Deadstart: 16 words of battery-driven CMOS-PROM is copied into con- 
trol store repeatedly. Execution is forced to location zero. 
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